Skip to main content

Arize AI is a U.S.-based company specializing in machine learning observability. Its platform tracks hundreds of billions of predictions every month for both well-known brands and innovative startups. Sensitive data, to say the least. In a market where Arize stands out for its technology – the company just raised $38M a few weeks ago – it also shines for its top-level security.

On the occasion of Arize’s new bug bounty with Yogosha, we met with Remi Cattiau, the company’s CISO. He shared his views on security at Arize and the machine learning operations (MLOps) space.

What is Arize?

Arize is a leader and pioneer in machine learning observability. Arize’s model monitoring tools enable seamless ML performance management, drift detection, data quality checks, and model validation. 

Many of our clients think of Arize as critical infrastructure and a key part of their machine learning stack because the platform helps them deploy models with confidence and stay a step ahead as facts on the ground change.

One of the best parts about Arize is that anyone can try it out since we have a robust free version of the product.

What safeguards and security processes have you established at Arize?

Arize’s security program is built on three pillars: auditability, prevention and preparedness. Arize achieved SOC 2 Type II certification earlier this year. We actually developed our controls against the NIST 800-53 standard, a checklist from the U.S. federal government that exceeds SOC 2 requirements. PCI DSS compliance and the HITRUST certification are also on the horizon. 

How do you keep an eye on your attack surface?

That’s the beauty of being a cloud company: we create a lot of things, and we have APIs for almost everything. We can leverage all trails from AWS and sinks from Google Cloud Platform – same thing, different names – to collect all the data from everything that runs on our cloud and then run automated controls on top of it. 

Previously, I created some open source projects related to capturing and analyzing all the trials produced by cloud providers. Arize leverages these kinds of scanning tools plus additional commercial tools to make sure nothing is out of line as we process millions or billions of data points every day.

We also use a Kubernetes environment with some intrusion detection systems (IDS) to analyze it. This technology really brings nice controls of what a process can do. We’re able to track what kind of network each process does, and restrict it using Cilium policies or things like that. With Kubernetes, we can really go into much more detail than what was possible 10 years ago. It has really shifted the industry.

Because almost everything is revolving around APIs, we can achieve security through development and automation. This allows us to produce really robust security programs without having a massive team.

There are other efforts underway, but those are the highlights.

Why did you go for a bug bounty program?

Bug bounties are a good complement to other security programs. A pentest, for example, is more of a compliance thing. Some pentesters go really deep, but it’s mostly about scratching the surface to find basic vulnerabilities. With a bug bounty, it’s a different exercise entirely – you need to be confident in your security program!

Arize is at this stage: we’re confident in our security, so a bug bounty makes sense. Ultimately, it’s a way to attract more eyes to inspect the security of our platform. We plan to have really good rewards for critical vulnerabilities to entice the best hunters. Be warned, though: I don’t think it’s gonna be easy to break into our platform!

Editor’s note: Arize’s bug bounty was one of the programs selected for the Hunters Survival Game #2

How does bug bounty fit into your SDLC?

The bug bounty findings will integrate into our vulnerability remediation policy. We have one product, so it really simplifies the management of our bug bounty. It’s just another way to receive vulnerabilities. If something is critical, it will be patched with urgency.

The speed with which issues are fixed and the welcomeness of the development team at Arize is far above what I have seen in other companies. At Arize, the security team is basically just an extension of the development team – and, as I said, we automate almost everything. We’re more of a development team that focuses specifically on security and infrastructure matters to secure everything, operating with a developer state of mind. We review vulnerability reports ourselves, we do the fixing and then we move forward.

To be honest, the way the company is structured and the quality and commitment of each Arize employee really makes my job easier!

How do you approach the relationship with ethical hackers and the triage of vulnerability reports?

Well, normally we are not flooded with vulnerability reports so triage is not a common scenario! That said,  triage is still really important. 

I think there is a huge misunderstanding of the CVSS Base Score in the industry. I’ve seen several people solely rely on it, but they forget that the CVSS scoring methodology includes the Temporal and Environment Score that really reflect on the reality of the exploit in your environment. The CVSS Score is usually the worst thing that can happen. But if you consider your secured infrastructure, you can reduce the risk. This is what really matters to the business: a reality check against a hypothetical.

For example, if a library so far in the backend that it’s not even exposed is critical based on your environment, that means it’s probably a medium. In these cases, it can be helpful to explain the context to  ethical hackers.

That’s why we share our vision and ensure that they understand our scoring. The manifest of the bug bounty also needs to be crystal clear. At the end of the day, it’s really a matter of explaining correctly how you see things. This is the foundation of a healthy relationship with hunters in my opinion.

Why Yogosha?

I personally have followed Yogosha from the beginning. I’m pleased with the team’s availability and professionalism. With pricing that is within the market rate, I see no reason to end our relationship.

The AI field moves fast. How do you balance innovation and security?

AI as a field does indeed move fast; it’s probably one of the fastest-moving technical fields out there. But the very structure of Arize allows us to keep pace, or even outrun it.

Since we are really customer-focused, we’re always looking for ways to help customers improve their models and their businesses. This is done by evolving our product. The good news is that in order to do so, Arize itself has to embrace cutting-edge technology. This allows the security team to be plugged directly into the core of the platform, making most security tasks super quick.

Since we automate most of our security controls, we can achieve scale and ensure the infrastructure follows our policy. For example, the IaC allows us to ensure network updates are validated by the security team. We also capture inventory of what is running and history along with it. Again, we think of the security team as an extension of the engineering team with a specific focus on security and infrastructure – an infrastructure that itself is full infrastructure-as-code.

So to answer your question: we don’t balance innovation and security, we just do both at the same time. It’s an end-to-end process. For example, we implemented a secured private network for a customer in just a day or two. In another company, that could have been a six-month project.

As a CISO, what do you think is the first thing to do in the event of a breach?

Arize’s third security pillar is preparedness. In developing a framework, Arize takes its cue from sectors with long histories of effective risk management. The airline industry is a great example. If you’re a pilot flying a plane and an engine stops, the first thing you are trained to do is to find the checklist for this specific situation then follow the steps. Those processes are thought out in advance, without stress and with the right state of mind. When you’re in the middle of a crisis, you need to accept the fact that a lot of other people in a better position than you in the heat of the moment – have already figured out the whole process.

In the event of a security breach, it’s no different. You go grab that checklist – for us it’s the “Security Breach Process” – and you follow each step.

What advice would you give to any company to reduce their digital risks?

My main advice is this: don’t take it for granted.

I’ve seen a lot of people explain how great their security policy is. But then you talk to the people on the ground, and you realize that the so-called policy isn’t really in effect.

If you deploy a new policy or control, get metrics to ensure its application. And verify with automated tests that it won’t deviate in time from the original intent.

What are the biggest security challenges in AI?

AI relies on a lot of data; in the case of Arize, we are tracking close to a trillion predictions a month. Model inputs and features range from publicly-accessible data to far more sensitive information like personally identifiable information, personal health information, and credit card information. Given that most companies don’t run AI on trivial data that they don’t care about, we need to secure all data as if it were the most critical. That’s why we are so careful and ensure we follow every industry best practice.

What do you think can be improved in the future and what are Arize’s goals for 2022/2023?

Arize’s mission is to make AI work and work for the people. As the pace of innovation in AI accelerates, new innovations must be accompanied by robust end-to-end ML observability to improve model performance and ensure that AI is used fairly and ethically. In the coming years, Arize will continue to build out its core ML observability platform and scale efforts around monitoring unstructured data and uprooting racial bias. In terms of compliance, we will keep adding security and compliance programs to cover nearly every industry and ML use case.